Goto

Collaborating Authors

 translation service


Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent

Zhang, Yangshijie, Wang, Xinda, Liu, Jialin, Wang, Wenqiang, Ma, Zhicong, Jia, Xingxing

arXiv.org Artificial Intelligence

ABSTRACT With social media growth, users employ stylistic fonts and font-like emoji to express individuality, creating visually appealing text that remains human-readable. However, these fonts introduce hidden vulnerabilities in NLP models: while humans easily read stylistic text, models process these characters as distinct tokens, causing interference. We identify this human-model perception gap and propose a style-based attack, Style Attack Disguise (SAD). Experiments on sentiment classification and machine translation across traditional models, LLMs, and commercial services demonstrate SAD's strong attack performance. We also show SAD's potential threats to multimodal tasks including text-to-image and text-to-speech generation.


PDFMathTranslate: Scientific Document Translation Preserving Layouts

Ouyang, Rongxin, Chu, Chang, Xin, Zhikuang, Ma, Xiangyao

arXiv.org Artificial Intelligence

Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world's first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work has been open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 222k downloads.


AI dubbing opens up new languages with Adobe Firefly

PCWorld

With the content we create now reaching a global audience, it's essential that local languages are used if you want your message to be understood. Up until recently, this has required using professional translation services which take time and money. But, thanks to the powerful AI capabilities of Adobe Firefly, you can quickly and easily turn your English-speaking audio into Spanish, French, Chinese or many other languages with only a few clicks. Here's how Adobe Firefly can make sure you're heard all around the world. Firefly is Adobe's generative AI solution, which allows users to create or enhance images, videos and audio, as well as translate languages.


Mind the Language Gap in Digital Humanities: LLM-Aided Translation of SKOS Thesauri

Kraus, Felix, Blumenröhr, Nicolas, Tonne, Danah, Streit, Achim

arXiv.org Artificial Intelligence

We introduce WOKIE, an open-source, modular, and ready-to-use pipeline for the automated translation of SKOS thesauri. This work addresses a critical need in the Digital Humanities (DH), where language diversity can limit access, reuse, and semantic interoperability of knowledge resources. WOKIE combines external translation services with targeted refinement using Large Language Models (LLMs), balancing translation quality, scalability, and cost. Designed to run on everyday hardware and be easily extended, the application requires no prior expertise in machine translation or LLMs. We evaluate WOKIE across several DH thesauri in 15 languages with different parameters, translation services and LLMs, systematically analysing translation quality, performance, and ontology matching improvements. Our results show that WOKIE is suitable to enhance the accessibility, reuse, and cross-lingual interoperability of thesauri by hurdle-free automated translation and improved ontology matching performance, supporting more inclusive and multilingual research infrastructures.


Charles Translator: A Machine Translation System between Ukrainian and Czech

Popel, Martin, Poláková, Lucie, Novák, Michal, Helcl, Jindřich, Libovický, Jindřich, Straňák, Pavel, Krabač, Tomáš, Hlaváčová, Jaroslava, Anisimova, Mariia, Chlaňová, Tereza

arXiv.org Artificial Intelligence

We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society. The system was developed in the spring of 2022 with the help of many language data providers in order to quickly meet the demand for such a service, which was not available at the time in the required quality. The translator was later implemented as an online web interface and as an Android app with speech input, both featuring Cyrillic-Latin script transliteration. The system translates directly, compared to other available systems that use English as a pivot, and thus take advantage of the typological similarity of the two languages. It uses the block back-translation method, which allows for efficient use of monolingual training data. The paper describes the development process, including data collection and implementation, evaluation, mentions several use cases, and outlines possibilities for the further development of the system for educational purposes.


Bootstrapping Multilingual Semantic Parsers using Large Language Models

Awasthi, Abhijeet, Gupta, Nitish, Samanta, Bidisha, Dave, Shachi, Sarawagi, Sunita, Talukdar, Partha

arXiv.org Artificial Intelligence

Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, translation services may continue to be brittle due to domain mismatch between task-specific input text and general-purpose text used for training translation models. For multilingual semantic parsing, we demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting. Through extensive comparisons on two public datasets, MTOP and MASSIVE, spanning 50 languages and several domains, we show that our method of translating data using LLMs outperforms a strong translate-train baseline on 41 out of 50 languages. We study the key design choices that enable more effective multilingual data translation via prompted LLMs.


DeepL, the AI-based language translator, raises over $100M at a $1B+ valuation • TechCrunch

#artificialintelligence

Artificial intelligence startups, and (thanks to GPT and Open AI) specifically those helping humans communicate with each other, are commanding a lot of interest from investors, and today the latest of these is announcing a big round of funding. DeepL, a startup that provides instant translation-as-a-service both to businesses and to individuals -- competing with Google, Bing and other online tools -- has confirmed a fundraise at a €1 billion valuation (just over $1 billion at today's rates). Cologne, Germany-based DeepL is not disclosing the full amount that it's raised -- it doesn't want to focus on this aspect, CEO and founder Jaroslaw Kutylowski said in an interview -- but as we were working on this story we heard a range of figures. At one end, an investor that was pitched on the funding told TechCrunch that DeepL was aiming to raise $125 million. At the other end, a report with a rumor about the funding from back in November said the amount was around $100 million.


Top Language Translation AI To Watch in 2022

#artificialintelligence

When it comes to languages, many problems arise in typical translation services. Either it is bad grammar or the translation does not completely make sense afterward. It is essential that these mistakes do not fall through during the final translation, whether it's during a business transaction or simply a conversation. Luckily, technology has advanced this process with the help of automation and artificial intelligence, assisting with speed and accuracy. In this article, we will discuss some of the most prominent and up-and-coming companies that provide these automated solutions that break down the language barrier.


The business value of NLP: 5 success stories

#artificialintelligence

Data is now one of the most valuable enterprise commodities. According to CIO.com's State of the CIO 2022 report, 35% of IT leaders say that data and business analytics will drive the most IT investment at their organization this year, and 58% say their involvement with data analysis will increase over the next year. While data comes in many forms, perhaps the largest pool of untapped data consists of text. Patents, product specifications, academic publications, market research, news, not to mention social feeds, all have text as a primary component and the volume of text is constantly growing. According to Foundry's Data and Analytics Study 2022, 36% of IT leaders consider managing this unstructured data to be one of their biggest challenges.


Meta's NLLB-200 AI model improves translation quality by 44%

#artificialintelligence

Meta has unveiled a new AI model called NLLB-200 that can translate 200 languages and improves quality by an average of 44 percent. Translation apps have been fairly adept at the most popular languages for some time. Even when they don't offer a perfect translation, it's normally close enough for the native speaker to understand. However, there are hundreds of millions of people in regions with many languages – like Africa and Asia – that still suffer from poor translation services. "To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world's languages. Today, we're announcing an important breakthrough in NLLB: We've built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish."